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forecasting and monitoring disasters as well as natural phenomena. This paper 
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operations to identify and simultaneously predict threat proceedings in natural 
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data that must be warned. This can save energy consumption for other areas 
of safety. This work shows promising points of effectiveness. 
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1. INTRODUCTION 

Wireless sensor networks (WSNs) support many applications in different fields, including either in 
civilian or military areas [1], [2]. The networks usually consist of a huge number of sensor nodes deploying in 
working areas. The sensor nodes collect data to be sent to a base station (BS) or data processing center for 
different purposes, including detection, monitoring, building scalar field maps, and early warning events [3]- 
[5]. There are many data collection methods in WSNs. Each routing method has specific advantages and is 
suitable for applications in some special tasks. All sensors are considered equal in random walks [6], [7]. Each 
node forwards its data to one of its neighbors based on probability. This method may increase latency for data 
transmission in such networks. In tree-based routing [8], [9], sensor nodes transmit their data to parent nodes 
and finally to be sent to the root as the BS. The routing paths are usually fixed but could be affected by some 
disconnections while transmitting in a vast area with many nodes. The clustering methods are quite common 
and stable since either the cluster heads or the routing paths of the inter-cluster can be changed [10], [11]. 
Hence, clustering methods are usually deployed to support WSNs detecting critical or essential events. 

Natural disasters can be considered sudden effects that negatively impact the balanced ecosystem 
between the natural and social systems [12]. The frequent occurrence of extreme weather events in recent years 
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has caused a lot of severe losses in many aspects of life [13]. According to a statistic of our world in data, there 
is an average of 60,000 dead people per year as a result of natural disasters, accounting for 0.1% of global 
deaths over the past decade [14]. However, it is fortunate that this abruptness is predictable based on the support 
of the early warning systems. 

The platform of the warning systems are built based on meeting the following three primary standards: 
i) scientificity of the system, ii) reliability of data, and iii) accuracy of early warning. With respect to the 
reliability of data, sensors play an important role in aiming to provide accurate early warnings based on the 
ability to capture more detailed and reliable big data of the ambient natural environment [15]-[18]. Scientificity 
is a combination that starts from the correlation between high-risk factors and simultaneously coordinates and 
cooperates with other subsystems, using scientific methods and accurate data statistics. Meanwhile, accuracy 
and real-time capability can be achieved by analyzing and processing data of artificial intelligence, deep 
learning, etc. technologies [19], [20]. A specific example is the artificial neural network considered for analysis 
of flood management system in [21] and practically implemented in forecasting the Blue Nile river flows in 
Sudan [22]. The overview chart for earthquake situational analytics uses data visualization to present the 
condition across the area around the earthquake zone [23]. 

Data visualization systems provide a visual representation of the dataset, allowing users to analyze 
data and discover new knowledge via exploratory analysis. Based on dataset types, attributes, and domain 
tasks, visualization designers come up with a specific design in terms of color coding, layout, and other visual 
channels such as angle, shape, and size [24], [25]. The purpose is to turn data sources into visual, easy-to- 
observe, and understand information to clearly convey the data's full insights to the viewer. Therefore, this 
technique can provide an intuitive tool for finding the correlation between the cycles of natural disasters, 
through which people can make accurate judgments for warning disasters. 

This study proposes a new direction for deploying the data visualization technique applied to predict 
natural disasters. The WSN is modeled with numerous sensor nodes deploying in a sensing area. A clustering 
algorithm partitions the network into clusters. A greedy tree algorithm is proposed to collect data from the 
clusters and finally to be sent to the BS. Sensing data is gathered in the database for analyzing and processing. 
After processing data, a visualization technique is applied to present a visual inspection for changes in sensor 
areas. Unusual changes are the early signs of disaster in region of interests (ROIs). This early warning sign 
supports humans’ focus on better disaster prevention at living in those areas and minimizing loss of life and 
property.The analysis of the early warning system (EWS) is organized as: The next section introduces the 
system model of the proposed warning system. On the basis of this, section 3 describes the preliminary results 
of the research findings, while section 4 makes recommendations and challenges for future work. Finally, 
conclusions and future directions in this study are provided in section 5. 


2. SYSTEM MODEL 
2.1. The wireless sensor network (WSN) model 

The WSN is supposed to contain N sensor nodes distributed randomly in a sensing region. The 
network's goal is to gather data from all nodes and send it to a data processing center or a BS. Regarding energy 
conservation, the network is separated into clusters, with one cluster head (CH) assigned to each cluster based 
on a clustering algorithm. A tree-based routing technique is developed to connect all of the CHs with the root 
at the BS. Based on the routing tree that connects all the CHs, all CHs collect data from their local clusters and 
then pass their sensor readings through the CHs closer to the BS. The BS can achieve all the data from the 
network based on the inter-cluster tree. Both the clustering algorithm and the tree-based routing algorithm aim 
to collect sending data to be sent to the BS. The Algorithm | divide the network into clusters. The Algorithm 
2 provides a routing tree to take all sensing data from clusters to the BS. The steps in details are provided as 
follows. 


2.1.1. Clustering Algorithm 

In Algorithm 1, a certain number of sensors are chosen as CHs. The non-CH sensors send their data 
to the CHs they belong to. The CHs collect data among their clusters. They send the received data including 
their own to the BS via the inter-cluster routing tree as follows. 


Algorithm 1. Clustering algorithm 
1- All sensors are nominated to be CHs based on a probability as p=" 


N? 
2- M sensors are chosen as cluster-heads; 
3- The remainder sensors choose their closest CHs to join the create M clusters. 
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2.1.2. Greedy tree algorithm 


In Algorithm 2, the routing tree is created. All the CHs choose the ones closet to the BS to connect. 


Then, they can start forwarding their data to the ones closer to the BS, as shown in Figure 1. 


cas 2. Greedy Multiple-hop Tree formation (GMT) 


While (the routing paths is changing) 

Ny (BS) = 0; i € N. CHs 

Nei = set of CH;’s neighbors 

if distance [i, j] < R, where j € Nei 

CH(i) chooses CH(j) when Ny (j) = min{Ny (Nei) } 

Name Ny (i) = Ny (j) + 1; 

end if 

end while (Until no change of routing paths between CHs) 


SNA WKWNHE 


Cluster-Head (CH): al 


Non-CH sensor: E 


Base-station (BS) 


eo Intra-cluster communications 


eed Inter-cluster communications 


Figure 1. Illustration of the system WSN model including intra-cluster and inter-cluster communications 


2.2. The early warning system (EWS) model 


As described previously, the model of the EWS has mainly four processes, including data collection, 


data processing, predicting, and decision-making, presented in Figure 2. The structure of the EWS consists of 
four subsystems, as shown in: 


WSNs subsystem: A set of sensor nodes are randomly deployed in sensing areas such as high-rise 
buildings, mountainous areas, hydroelectric dams, or areas at high risk of tsunamis and earthquakes. They 
collect data on humidity, precipitation, air quality, and geological movements. With the potential for 
providing reliable data transmission, the data transmission database through wireless communication 
technologies [26]. 

Database: Big data is used to manage and analyze data with multi-platform, multi-scale, and multi- 
discipline data [27]. Sensors might generate heterogeneous data and even comprise noise and 
misinformation. Therefore, the system needs to integrate multiple sources and process data, aiming to 
enhance the quality and completeness of data. 

Data visualization: Data visualization provides the visual representation of the data and information by 
different visual encodings, such as graphical elements and visual channels. The human brain's tendency 
to receive visual information is much easier than in other forms. The incorporation between data 
visualization and machine learning, discussed in [28], can learn and predict potential nature threats and 
formulate strategies to mitigate losses. 

Region of interests (RoIs): With the output results of the data visualization, the sensing data at the alert 
level is shown to the viewer. They can then drill down to see where the problems are occurring and start 
formulating a plan to solve them. Following this way, data visualization completely allows users to spot 
problems immediately and make logical decisions for predicting the Rols. 
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Figure 2. The system model of the early warning system using WSNs and data visualization 


3. ANALYSIS AND SIMULATION RESULT 

We propose a visualization-based framework, as shown in Algorithm 3, supporting monitoring in 
early warning systems, where the data collected by wireless sensor networks are either univariate or 
multivariate. Although the data characteristics may vary due to spatial processes and cross-correlation within 
the observed data, the difference in prediction results between the two groups is insignificant in terms of 
effectively monitoring multiple spatial random fields. Utilizing K-means clustering can support capturing 
highly unusual events characterized by experimental measurements. Furthermore, visualization supports an 
interactive, intuitive visual representation for monitoring and analysis in a timely manner. 

We assume N sensors are deployed randomly in a sensing area. The sensing data received from all 
sensor nodes are sent to the central database. In the univariate case, sensors send data on the environment 
temperature (T) to the database, while in the multivariate case, the data may contain both temperature and 
humidity readings. The aim is to distinguish between danger levels so those living near a dangerous area can 
prepare for evacuation and promptly seek sanctuary. 


Algorithm 3. Framework for data analysis 

Phase 1. Data collection: the original data collected from spatial fields are gathered 
at a central database. The maximum feasible dimension of data is defined at this stage 
along with sensor selection. 
Phase 2. Data preprocessing and data wrangling: As raw data often contains noise and 
redundancy, this stage focus on cleaning data and extract meaningful feature. Data 
wrangling helps transform complex data into useful, desired format for easy access, 
visualization and analysis. 
Phase 3. Data clustering: This stage focuses on detecting natural object partitioning 
among data. For example, on univariate dataset which the observation is on environment 
temperature: 

If 0 < T < 35°C> Cluster 1 > Green (Normal) 

35°C < T <45°C > Cluster 2 ~ Yellow (Moderate risk) 

45°c < T < 55°C > Cluster 3 > Orange (High risk) 

55°c < T > Cluster 4 > Red (Very high risk) 
Phase 4. Data visualization: Using visual encodings, data is visualized for an 
intuitive, straightforward representation. For example, one facet of original data is 
presented in Figure 3(a), the simulation results is visualized in Figure 3(b). 
Phase 5. Domain knowledge incorporated for analysis: Users can filter and explore 
warning data clusters Figure 4(a). With interaction, domain knowledge is applied for 
further analysis, with in-depth view into the clusters which are especially dangerous 
Figure 4(b). 


Suppose we are given a data set X =X 1.....Xy,X, € Rg. The M-clustering problem aims at 
partitioning this data set into M disjoint subsets (clusters) C;... Cu, such that a clustering criterion is optimized. 
The most widely used clustering criterion is the sum of the squared Euclidean distances between each data 
point x; and the centroid mx (cluster center) of the subset C, which contains x;. This criterion is called clustering 
error and depends on the cluster centers m...my [29]: 


E(m,...,My) = Vie eal E Cx) ||; — Mell? , (1) 
where /(X) = / is true and 0 otherwise. 
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3.1. Most common measure is sum of squared error (SSE) 
For each point, the error is the distance to the nearest cluster, to get SSE, we square these errors and 
sum them. 


SSE = YE Yxec, dits?(m;x) , (2) 


where xj is a data point in cluster c; and mj is the representative point for cluster cj. 

Figure 3 visualizes sensor network data using clustering and non-clustering algorithms. Figure 3(a) 
shows the data of the WSN without using the data clustering algorithm. It is difficult to determine where the 
critical data area is or the normal data area. Figure 3(b) is a graph representing the data that has been processed 
through the data clustering algorithm. However, we only have an overview of the data and can make initial 
predictions about the data. We need to visualize the data in more detail for each location and only care about 
the alert data area. To do this, we use data filters, as shown in Figure 4. Figure 4(a) is the data through one 
filter; we have two data areas, the orange warning data area and the red special severe warning data area. To 
prioritize particularly severe warning zones and minimize observation time and data analysis, we use a second 
filter to observe only the red zone data, as shown in Figure 4(b). Thus, the data filtering process has clarified 
the data visualization, reducing the time of observation and data analysis a lot. We can locate the critical 
warning zone in the fastest time. Therefore, we can provide timely handling methods to minimize damage. 
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Figure 3. Visualizing simulation data (a) Original univariate data and (b) with K-Means algorithm 
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Figure 4. Interaction with filtering and zooming for further analysis (a) first time filter and (b) second time 
filter 


We utilize the elbow method to determine the optimal number of clusters for the dataset, as 
demonstrated in Figure 5. When K = 1, SSE is the largest. The value of SSE decreased when K increased. This 
means that data processing and partitioning results will be more accurate the bigger K is. However, when K 
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increases to a certain value, the accuracy no longer increases significantly. In this case, we could choose K=4, 
where the relative improvement is not very high after this point. 

For data collection methods, as shown in Figure 6, we have tried a different number of clusters, 20 
clusters, and 100 clusters. As described in subsection 2.1., the network is modeled based on two algorithms for 
data collection purposes, clustering, and tree-based routing. The BS is at the center of the square sensing area. 
In intra-cluster communications, all non-CH sensors send data to their CHs. Then, the inter-cluster 
communications, based on the tree that connects all the CHs, collect data from the CHs to be sent to the BS. 


le7 


Sum of squared error 
& 
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K 


Figure 5. Elbow method for determining the optimal number of clusters 
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Figure 6. Illustration of intra-cluster and inter-cluster data collection and K-means clustering with 
diferent number of clusters (a) 20 clusters and (b) 100 clusters 
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4. DISCUSSION 

The current trend of scientific research in the world is the application of human intelligence to analyze, 
processing as well, and forecast the behavior of things in the surrounding world. This section considers two 
aspects: i) the potential of data visualization techniques on the empirical application; ii) the challenges and 
opportunities in implementing the early natural disasters warning system; and iii) the energy efficiency for 
WSNs based on DV techniques. 

The nature of data visualization is also based on the human cognitive system, which tends to be more 
efficient when recognizing information represented visually. Hence, it plays a vital role in exploiting big data 
rapidly and effectively in real time through interactive elements and new visualizations. Thanks to the 
significant benefits such as better analysis, quick action, identifying Rols, and grasping the latest trends, this 
technology can be deployed in several applications that directly affect decision-making and even change 
organizations. However, one of the remaining challenges for the data visualization technique is scalability for 
big data. In addition, choose the appropriate format to ensure that the content presentation is the most 
reasonable. Users need to choose suitable technologies, such as machine learning, and artificial intelligence, to 
meet the requirements to explore the big data sources used for the data visualization techniques. 

In recent years, complex happenings of weather have created unpredictability of extreme weather 
events. This required the warning system to work more professionally to capture weather changes promptly. 
The system should also focus on communicating infrastructure to share real-time disaster management 
information. An example of such use cases is incorporating the resulting sensor data visualization to a more 
extensive, comprehensive dashboard for monitoring, such as with geological aspects, as demonstrated in 
Figure 7. The sample geolocation data is retrieved from VAST challenge 2019 [30]. Sensor readings for each 
location are visualized and supported with interactive features, such as filtering and zooming, for monitoring 
and exploratory tasks. 


filter 


Figure 7. Incorporation of sensor data visualization with corresponding simulation geolocation data 


Regarding the energy efficiency for WSNs based on DV techniques, the proposed method can support 
the WSNs to recognize sensor nodes in the warning areas. Hence, only the sensors communicate with each 
other and the base station (BS) to forward their data. The other nodes from the network can save energy for 
other functions. This can save significant energy consumption for all sensor nodes. This also reduces the burden 
of big data transmission that may cause latency or packet loss in the network communication. 


5. CONCLUSIONS 

This paper proposes a new model of the natural disaster warning system with data visualization 
technology to optimize the warning systems in general. This system promptly recognizes significant signs of 
disasters, such as floods, wildfires, earthquakes, etc., to provide warnings in advance. Sensing data is observed 
to be processed with data visualization techniques that provide more efficient results. The simulation results 
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describe the data processing and filtering to show Rols with different priorities to ensure prompt and mitigate 
high risk. This work shows good points for being able to deploy in practical fields. In addition, the WSNs can 
only send essential data from sensors in the warning levels. Based on that, the networks can save energy 
consumption and also can prolong the network's lifetime. Some challenges and opportunities are also identified 
to develop efficient and reliable warning systems and DV technology. In the future, machine learning and 
natural language processing can cooperate to reveal significant insights from big data. Thereby, this system 
can improve to deploy in practice. 
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